A “data sharing trust” model for rapid, collaborative science
نویسندگان
چکیده
Complex datasets provide opportunities for discoveries beyond their initial scope. Effective and rapid data sharing management practices are crucial to realize this potential; however, they harder implement than post-publication access. Here, we introduce the concept of a “data trust” maximize value large datasets. With advent new technologies an appreciation systems-level analyses, there growing number research endeavors that generate large, multi-modal These projects often involve many investigators who bring complementary expertise in biological sub-specialties, both generating analyzing specific types, contributing clinical perspective understanding. Such present incredible opportunity scientific advancement, but be successful, require iteration, elaboration, near real-time, planned duration scope project. A key development current coronavirus disease 2019 (COVID-19) pandemic has brought forefront is importance real-time sharing—bringing eyes insights important questions. Furthermore, NIH recently released “Data Sharing Management Policy” requiring stated plan all federally funded projects; underscores practice prompts community devise practical solutions challenge. on possible approaches move traditional “access-restriction” models, which limited with emphasis secondary analysis, typically after first publication already been generated. models inflexible, tend overvalue work involved production raw undervalue analytical interpretation. Instead, will model seeks honor personal incentives drive passion scientists while enabling access well-annotated as early as, ideally concert with, its production. We highlight our application these ideas COVID-19 effort, collaboration trust across over 150 researchers. need further might represent right seed make valuable enterprise institutions, even state pandemic-driven, community-minded projects. Big fast emphasize concurrent production, quality-control, primary insight generation. However, currently several barriers fluid timely among Beyond logistical constraints, including lack infrastructure efficient capture significant time effort required researchers curate datasets, three stakeholders consider when crafting agreements sharing:(1)Investigators. Publications necessary career seek contributions solidify status field; could jeopardize one’s chances or credited discovery.(2)Research Institutions. Institutions own monetize intellectual property (IP) developed by investigators. Data can lead other parties developing IP based produced investigators, therefore weakening position institution.(3)Human Subjects. For involving human subjects, strictly regulated protect patient privacy. Failure properly address concerns undermines public trust, jeopardizes participation future research, result serious costly legal consequences. The latter two derived from investigator’s home institution not extensively covered here. Briefly third, must take special care drafting informed consent forms enable, much possible, usage purposes—including request distribute de-identified additional potential industry partners, and/or repositories—with secure systems place shield protected health information respect privacy autonomy. Institutional Review Boards should engaged design programs ensure protocols at same protecting interests. main focus barrier: realities academic science expects publications—most importantly, corresponding authorships—of novel findings high-impact journals. This reality complemented very real having reasonable window discover what it sought out study place. It oversimplification expect mandate immediate broad release without addressing some way. In addition generation, invest substantial resources into conceiving organizing study. Their motivation, producing breakthroughs, requires them analyze expand upon publish insights. want share data, others, using fruits labor, adequately include discovery process assignment credit works. Incentives formulated convince against perceived interests (Bierer et al., 2017Bierer B.E. Crosas M. Pierce H.H. Authorship Incentive Sharing.N. Engl. J. Med. 2017; 376: 1684-1687Crossref PubMed Scopus (67) Google Scholar; Olfson 2017Olfson Wall M.M. Blanco C. Incentivizing Collaboration Medical Research-The S-Index.JAMA Psychiatry. 74: 5-6Crossref (14) Scholar), do always environment minimizes risk being “scooped within” your data. Two contrasting dominate sciences: (1) distribution publication, (2) near-real-time (Birney 2009Birney E. Hudson T.J. Green E.D. Gunter Eddy S. Rogers Harris J.R. Ehrlich S.D. Apweiler R. al.Toronto International Release WorkshopPrepublication sharing.Nature. 2009; 461: 168-170Crossref (189) Scholar). Both seen progress norms toward collaborative science. mandate, funding agencies journals commitment (Sim 2020Sim I. Stebbins Bierer Butte A.J. Drazen Dzau V. Hernandez A.F. Krumholz H.M. Lo B. Munos al.Time sharing.Science. 2020; 367: 1308-1309Crossref (15) wealth thus exists domain investigation use. limitations hinder first, tying requirements incurs time-delay between generation sharing, ranging months years. During time, have deriving likely would ideal fruitful because investigator most intently focused particular dataset. Pre-print servers like BioRxiv, MedRxiv, ChemRxiv accelerate timeline, rarely full release. addition, burdensome researcher, resulting labs keeping “two sets books:” internal version rely analysis external minimal curation after-the-fact. second set less granular thoroughly annotated “in-house” version. poses problem and, though surmountable, results more lower integrity. system curates library” collection optimal. Largely response shortcomings, others advocate opposite approach: Although approach solves problem, problematic discussed previous section. Ultimately, nuanced dissemination encourage safeguarding ecosystem. here bridge divide camps thought access-restriction models. begin “shells” (Figure 1A), starting producers (or “stewards”), freely collaborators (the stewards labs), followed restricted subsequent broader participating finally public. also idea succession “raw” “processed” “insight-level” categories, categories shared whereas last category dependent insight-generating studies analyses 1B). note each transition curated insight-level continually add interpretation process, distilling steady progression. There levels processed Similar embargo class then proposed schedule determined steward project leadership. Our underlying ethos essential define see within sphere (“data trust”). example, collaborating generated part project, contact dataset engage before engagement routine reporting insights, subsequently use any obliged to, minimum, offer authorship (primary investigators) members New interested accessing agree follow proposal submission read sign agreement prior Notably, although team either intra- inter-institutional, established ongoing limit total collaborators. Despite this, codification allows granted if was settled advance. At level, nature changes, understand costs benefits propose promotes integrity discovery, summarized Figure 1B. scheme, deposited platform, library,” directly instruments. By setting expectation immediately worthy shareable format, framework guarantees QC conceived start. benefit informing available offers possibility integrative stewards. Large enough exist strongly motivate go through curating depositing system. Several examples platform features incentivize create added individual include, to:(1)Support manages standardization quality-control import process;(2)Development visualization tools top once imported, rather re-exporting yet another where private;(3)Seamless integration (e.g., literature, publicly databases, pre-publication question) loaded onto readily perform cross-dataset analyses. deliver amount value, appropriate personnel devoted development. Accordingly, sure support such efforts, institutions support. Building only technically feasible, aligned stakeholders. Unfortunately, foundational prestigious publications, needs consideration almost exclusively tied achievement. come regularly held project-wide lab meetings equivalent—this feature extraction dimensionality reduction, observed signatures biology revealed creates atmosphere openness inclusion, forum integrate feedback own. If attendees document (see Box 1), least societal norm governs how trust.Box 1An example restrictions trustFor recent (COMET), following trust. Over ten agreed deposit bulk- single-cell sequencing, cytometry flight (CyTOF), cytokine profiles, antibody characterization, UCSF Library open signed COMET Agreement. Progress bi-weekly meetings. An excerpt Agreement reads follows:“As you repository, host labs. distinguishes Each associated “steward,” PI trust: given kept included collaborator. policy strike balance promoting respecting investment put data.”The includes clauses:•As facilitate upload repository manner (ideally 1 day 2 weeks generation). ancillary samples acquisition.•Prior I am steward, permission specified continue update my respectful collaborator.•I meeting.•I collaborator, direct survey approval, confirm received approval data.•I COMET’s policies. How working: challenges solutionsThis presented requirements, triumphs, pitfalls, solutions—it enabled series collaborations during critical moment course pandemic. Patient honoring consentBecause circumstances illness, certain hospitalized patients were enrolled under waiver consent, widely; later declined needed destroyed, consented made team. led complications management, Library.Solution: built restriction database file server patient’s “waiver,” records withheld search unless user had privileged updated “consenting,” switched unrestricted. “decline,” files automatically removed queued deletion. Equal samples, insightsAs multiple working leading conflicts “ownership” ideas.Solution: executive committee intervened resolve reorganize priorities domains, helping re-align distinct areas, combine manage overlap. Timely posting sharingLabs varied updates delays inequitable trust.Solution: introduced streams (project management) up-to-date meeting presentation. data.” Because Library. Solution: As ideas. Labs next steps types collaborators, form publications hosting. adaptability transparency. become explicitly expected clear understanding whom accessed. granting collaborator review outlined (contacting rules). Through entire improves “expertly curated” initially conceived. before, monitoring, minimize unequal misuse, erode use-case, implemented refined called “COVID-19 Multi-Phenotyping Therapies (COMET)” University California, San Francisco (UCSF). reflections detailed 1. component type best efforts crediting hard dedication members, those clinical, biospecimen processing, leadership teams. recommend leaders expectations outset recipe consortium attribution. regard authorship. larger culture shift inclusion formally rewriting guidelines, historically rewarded profound contribution (The Committee Journal Editors, 2020The Editors2020http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.htmlGoogle Scholar) manuscript. increasingly inadequate world different wherein easily forgotten underestimated leaders. happening, helped along considerably author paragraphs allow explicit. re-writing code age manuscript, issue cases reward offered collaboration, integral advancement worldwide. summary, now produce enormous fully achieved mined ways groups. technology advances, generated, majority existing private, under-analyzed, therefore, under-utilized. particularly wasteful considering dwarfs single institution. Incorporating all—raw level–with years earlier society whole. doubtlessly going refinements model. Critically, pioneered predominantly institution—therefore safeguards institutions. researcher temporary actions recorded, sufficient protections. Another option third subsets multi-investigator developments needed, seem timely, heels century’s fifth largest global threat, start refining now.
منابع مشابه
ORCHESTRA: Rapid, Collaborative Sharing of Dynamic Data
Conventional data integration techniques employ a “top-down” design philosophy, starting by assessing requirements and defining a global schema, and then mapping data sources to that schema. This works well if the problem domain is well-understood and relatively static, as with enterprise data. However, it is fundamentally mismatched with the “bottom-up” model of scientific data sharing, in whi...
متن کاملA Formal Support for Collaborative Data Sharing
Collaborating entities usually require the exchange of personal information for the achievement of a common goal, including enabling business transactions and the provisioning of critical services. A key issue affecting these interactions is the lack of control on how data is going to be used and processed by the entities that share it. To partially solve the issue, parties may have defined a s...
متن کاملA Computational Trust Model for Collaborative Ventures
Problem statement: The conceptual notion of trust and its underlying computational methods has been an important issue for researchers in electronic communities. While the independent trust evaluation is suitable in certain circumstances, such unilateral process falls short in supporting mutual evaluation between partners. Perceived reputation, the depth and breadth of trust, Trust Perception (...
متن کاملControlled Data Sharing for Collaborative Predictive Blacklisting
Abstract. Although data sharing across organizations is often advocated as a promising way to enhance cybersecurity, collaborative initiatives are rarely put into practice owing to confidentiality, trust, and liability challenges. We investigate whether collaborative threat mitigation can be realized via controlled data sharing. With such an approach, organizations make informed decisions as to...
متن کاملCADS: A Collaborative Adaptive Data Sharing Platform
Content management tools like Microsoft’s SharePoint allow users of an application domain to share documents and tag them in an ad-hoc way. Similarly, Google Base allows users to define attributes for their objects or choose from predefined templates. This ad-hoc or predefined annotation of the shared data incurs problems like schema explosion or inadequate data annotation, which in turn lead t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Cell
سال: 2021
ISSN: ['0092-8674', '1097-4172']
DOI: https://doi.org/10.1016/j.cell.2021.01.006